transformer model

The glossary is being gradually proof checked, but currently has many typos and misspellings.

A transformer model operates on time series or sequential data using a form of attention, identifiying past states/tokens that are particualrly related to the current token and then using these in particular as part of predicting future tokens. This is like the human ability to hear a sentence such as "The cat sat in the deep orange glow of sunset and licked its fur" – when you read the word 'licked', your mind instantly pulls out the word 'cat' as related and uses that to make sense of the current pont in the sentence. We use rich semantic structures to perform this, but transformer models use a vector simiilarity between a 'key' and 'query' for each token, where the key models the kind of thing that the input token is, and the query the kind of thing it would like to connect with.

Used in Chap. 14: pages 214, 220; Chap. 19: page 298; Chap. 22: page 347; Chap. 23: page 369; Chap. 24: page 376

Also known as: transformer networks

Used in glossary entries: attention, time series

transformer model

Terms from Artificial Intelligence: humans at the heart of algorithms

Links: